36 research outputs found

    Estimating the total number of phosphoproteins and phosphorylation sites in eukaryotic proteomes

    Get PDF
    Background: Phosphorylation is the most frequent post-translational modification made to proteins and may regulate protein activity as either a molecular digital switch or a rheostat. Despite the cornucopia of high-throughput (HTP) phosphoproteomic data in the last decade, it remains unclear how many proteins are phosphorylated and how many phosphorylation sites (p-sites) can exist in total within a eukaryotic proteome. We present the first reliable estimates of the total number of phosphoproteins and p-sites for four eukaryotes (human, mouse, Arabidopsis, and yeast). Results: In all, 187 HTP phosphoproteomic datasets were filtered, compiled, and studied along with two low-throughput (LTP) compendia. Estimates of the number of phosphoproteins and p-sites were inferred by two methods: Capture-Recapture, and fitting the saturation curve of cumulative redundant vs. cumulative non-redundant phosphoproteins/p-sites. Estimates were also adjusted for different levels of noise within the individual datasets and other confounding factors. We estimate that in total, 13 000, 11 000, and 3000 phosphoproteins and 230 000, 156 000, and 40 000 p-sites exist in human, mouse, and yeast, respectively, whereas estimates for Arabidopsis were not as reliable. Conclusions: Most of the phosphoproteins have been discovered for human, mouse, and yeast, while the dataset for Arabidopsis is still far from complete. The datasets for p-sites are not as close to saturation as those for phosphoproteins. Integration of the LTP data suggests that current HTP phosphoproteomics appears to be capable of capturing 70% to 95% of total phosphoproteins, but only 40% to 60% of total p-sites

    An exploration of alternative visualisations of the basic helix-loop-helix protein interaction network

    Get PDF
    Background: Alternative representations of biochemical networks emphasise different aspects of the data and contribute to the understanding of complex biological systems. In this study we present a variety of automated methods for visualisation of a protein-protein interaction network, using the basic helix-loop-helix ( bHLH) family of transcription factors as an example. Results: Network representations that arrange nodes ( proteins) according to either continuous or discrete information are investigated, revealing the existence of protein sub-families and the retention of interactions following gene duplication events. Methods of network visualisation in conjunction with a phylogenetic tree are presented, highlighting the evolutionary relationships between proteins, and clarifying the context of network hubs and interaction clusters. Finally, an optimisation technique is used to create a three-dimensional layout of the phylogenetic tree upon which the protein-protein interactions may be projected. Conclusion: We show that by incorporating secondary genomic, functional or phylogenetic information into network visualisation, it is possible to move beyond simple layout algorithms based on network topology towards more biologically meaningful representations. These new visualisations can give structure to complex networks and will greatly help in interpreting their evolutionary origins and functional implications. Three open source software packages (InterView, TVi and OptiMage) implementing our methods are available

    Together we stand: genes cluster to coordinate regulation

    Get PDF
    Although most eukaryotic genomes lack operons, occasionally clusters of genes are discovered that are related in function. Now, a metabolic operon-like gene cluster has been described in Arabidopsis thaliana that is needed for triterpene synthesis

    Target Analysis of Volatile Organic Compounds in Exhaled Breath for Lung Cancer Discrimination from Other Pulmonary Diseases and Healthy Persons

    No full text
    The aim of the present study was to investigate the ability of breath analysis to distinguish lung cancer (LC) patients from patients with other respiratory diseases and healthy people. The population sample consisted of 51 patients with confirmed LC, 38 patients with pathological computed tomography (CT) findings not diagnosed with LC, and 53 healthy controls. The concentrations of 19 volatile organic compounds (VOCs) were quantified in the exhaled breath of study participants by solid phase microextraction (SPME) of the VOCs and subsequent gas chromatography-mass spectrometry (GC-MS) analysis. Kruskal–Wallis and Mann–Whitney tests were used to identify significant differences between subgroups. Machine learning methods were used to determine the discriminant power of the method. Several compounds were found to differ significantly between LC patients and healthy controls. Strong associations were identified for 2-propanol, 1-propanol, toluene, ethylbenzene, and styrene (p-values < 0.001–0.006). These associations remained significant when ambient air concentrations were subtracted from breath concentrations. VOC levels were found to be affected by ambient air concentrations and a few by smoking status. The random forest machine learning algorithm achieved a correct classification of patients of 88.5% (area under the curve—AUC 0.94). However, none of the methods used achieved adequate discrimination between LC patients and patients with abnormal computed tomography (CT) findings. Biomarker sets, consisting mainly of the exogenous monoaromatic compounds and 1- and 2- propanol, adequately discriminated LC patients from healthy controls. The breath concentrations of these compounds may reflect the alterations in patient’s physiological and biochemical status and perhaps can be used as probes for the investigation of these statuses or normalization of patient-related factors in breath analysis

    The challenges of interpreting phosphoproteomics data : a critical view through the bioinformatics lens

    No full text
    During the last decade, there has been great progress in high-throughput (HTP) phosphoproteomics and hundreds or even thousands of phosphorylation sites (p-sites) can now be detected in a single experiment. This success is attributable to a combination of very sensitive Mass Spectrometry instruments, better phosphopeptide enrichment techniques and bioinformatics software that are capable of detecting peptides and localizing p-sites. These new technologies have opened up a whole new level of gene regulation to be studied, with great potential for therapeutics and synthetic biology. Nevertheless, many challenges remain to be resolved; these concern the biases and noise of these proteomic technologies, the biological noise that is present, as well as the incompleteness of the current datasets. Despite these problems, the datasets published so far appear to represent a good sample of a complete phosphoproteome of some organisms and are capable of revealing their major properties

    Choose your partners: dimerization in eukaryotic transcription factors

    No full text
    In many eukaryotic transcription factor gene families, proteins require a physical interaction with an identical molecule or with another molecule within the same family to form a functional dimer and bind DNA. Depending on the choice of partner and the cellular context, each dimer triggers a sequence of regulatory events that lead to a particular cellular fate, for example, proliferation or differentiation. Recent syntheses of genomic and functional data reveal that partner choice is not random; instead, dimerization specificities, which are strongly linked to the evolution of the protein family, apply. Our focus is on understanding these interaction specificities, their functional consequences and how they evolved. This knowledge is essential for understanding gene regulation and designing a new generation of drugs
    corecore